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ABSTRACT : 

The Winograd procedure for matrix multiplication [S. Winograd, 
Comm, on Pure and Applied Math. . 23, 1970] provides a method 
whereby general matrix products may be computed more efficiently 
than the normal method. In this report, we describe 'he algorithm 
and the time savings that can be effected. ' FORTRAN program is 
provided which performs a general matrix multiply according to this 
algorithm. 

Additionally, we describe a variation of this procedure that may 
be used to calculate Gaussian probability density functions. It is 
shown how a time savings can be effected in this calculation. The 
extension of this method to other similar calculations should yield 
similar savings. 
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I. Introduction : 
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In this paper, we show the Winograd* procedure for 

computing matrix products can be applied to various calculations 

used in digital processing of remotely sensed data. The basic 

procedure is described, a FORTRAN program for general matrix 

multiplication is provided, and an example (computation of Gaussian 

probability density functions) is worked out showing rhe regions 

where it 'mputationally faster and rhe amount of rime savings 

involved. Of course, there are many other calculations where this 

procedure can be effectively utilized. 

Essentially, the Winograd procedure effects a time savings in 

computing matrix products by trading off some of rhe multiplications 

involved for additions (multiplies are usually slower operations on a 

computer than adds). A relatively small amount of additional storage 

is required, bur a significant decrease in the mumber of multiplies 

(up to a factor of 2 for the case of multiplying two n x n matrices, 

if n is sufficiently large) can be gained. However, one must 

( 2 ) 

sacrifice some numerical stability in employing this procedure' \ 

In the next section, we describe the general matrix multipli- 
cation procedure developed by Winograd and compare it with the 
standard method. In Section III, we uescribe a variation of this 
procedure for the computation of Gaussian probability density 
functions. An appendix containing a FORTRAN program for general 
matrix multiplication is provided. 
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11. The Winograd Algorithm : 

Let As(a.j) be a pxq matrix, B = ( b 4 . ) , a q x s 
matrix, C*(c.j), a pxs matrix, x = (x j ), a q-vector, 
and y-(y ), a p-vector. The standard method for computing 
y = A x is 


I ■, 

j = l 


j X j 


1 = 1,2, . . , p 


( 1 ) 


and for computing C=AB is 


, , = S a . b . 
ij L iq qj 

t~\ 


i - 1 , 2 , . . . , p 
j = l , 2, . . . , s 


( 2 ) 


Thus to compute y in this manner, p q multiplies and p(q- 1 ) 
adds are necessary; and to compute C, p q s multiplies and 
p s(q - 1 ) adds. 

The Winograd procedure consists of rewriting cqs. (I) and (2) 
so that jorne quantities are precomputed. ITiis procedure is based 
on the identity 


a ik b k + a i, k-M b k+ 1 


" (a ik + b k + 1 )(a i,k+l 


+ b , ) - a 


ik i , k+1 


b k b k + 1 


Similar identities of a higher order may be used to construct other 
algorithms, but, for oui purposes here, these are not of much 
interest. Following Winograd's notation, we let |d] = largest 
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integer < d and fdl = smallest integer > d, with i = [JqJ, 
the Wi nog rad procedure, then, for computing y 


and 


T. a i , 2 u - 1 • a i,2u 1-1,2 P ( 3 a > 


u = 1 
l 

* K 1 x 

u= 1 


2u - 1 ' x 2u 


(3b) 


r 4 




I (a i,2u-l 4 x 2u )(a i,2u + x 2u-l ) 
u = 1 


5 j - H if q = 2e 

I (n i,2u-l + x 2u^ a i,2u + x 2u-l ) 


u = 1 


5 t - H - a iq x q if ^ = + 1 


(3c) 


This algorithm requires p |£q| + jt(p + l) multiplications and 
P ( Uq I +jf-l adds. If we have t y's to compute using the 
same A and t x 's , the operation counts become 

(p |£q | + l ) multiplies and p ( £ - 1 ) +t • - 1 + 

p(2«+ 1 + |lq|)^ adds, since the r.’s need not be 
recomputed. Table 1 shows the approximate (ignoring indexing, 
etc. ) time savings to be expected for computing 100 y's for 
various values of p and q for a ratio of the machine multiply 
time T to the add time T of 2.7 (the approximate value 
for an IBM 370/155). We see from this table that for low values 



(Note: I .-r i Tings always occur if 



5 


of p and q, there is a net loss ' M e for large values, up to 
a 16% savings results. 

The extension of this concept to hill matrix multiplication is 
(for C = A B) 




■ l V 


2u - 1 i , 2 u 


u = 1 


(4a) 


y b 


U=1 


2 u - 1 , j h 2u,j 


(4b) 


r i 


(a i,2u-l + b 2u,j )(a i,2u + b 2u-l,j) 


C U =< 


u = 1 



*i 

- ’j 

if c] = 2 £ 

£ 

I 

<a t. 

2 u - 1 + 

U = 1 



*i 


+ a iq b < 

for 

i = 1 , 

2 , . . . , p 


2u,j )(a i,2u + b 2u-1,j ) 


if q = 2 £ + 1 


This requires p s [fq] + t (p+s) multiplies and (£-l)(p+s) + 

P s ([*qjH-2+1) adds. For p = q = s = n and n large, 

this reduces to ~ * n + n multiplies and ^ n + n adds, 

which can effect a savings in computation time over the standard 

3 3 2 

procedure which requires n‘ multiplies and n‘ - n adds; 
using the ratio of 2.7:1 for multiplies to adds, in fact a savings 
always occurs for n > 5 . 
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A FORTRAN subroutine WN0MUL which performs matrix 
multiplication according to this algorithm is listed in the appendix. 
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HI. Calculation of Gaussian Probability l>ensity Functions : 

The n-dtmensi anal probability de dty for a normal population 
with mean u and covariance K is given by 


- i 


f ( x ) = (2”) “■ | K | ' exp [ ( x - u ) 1 K " 1 ( x - u ) J 


In this section, we shall concern ourselves only with the calculation 
of the quadratic form (the argument of exp), given u and L and 
D where L and 1) are the modified Cholesky decomposition^ 
of K, i.e. K = LDL 1 , with L being unit lower triangular 
and D, diagonal with positive diagonal entries. Then we can write 

, T 


( x - u ) 1 K ' 1 ( x - u ) = ( x - LI ) 1 L 


1 ’ 


I) -1 L ’ 1 ( x - u ) 


= y 1 I) 
n 


Y y{ 

i = 1 


2 / d. 


where 


y = L* 1 (x-u ) 


(5) 


(6) 


Eq. (6) can be solved by forward substitution, i.e. 


y\ = (x i • u i ) ' I 

j=i 


y j 


(7) 


where Iv = (i .). We then can use the Winograd procedure on the 
> J 2 

summation term in eq. (7). (The standard method requires H — -L-IL 

multiplies and ^ n ' 2 ) ( n - I _ ) ajjg t G evaluate this term). We note 


2 
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that for evaluating this expression for more than one x, we can 
use a variation of the algorithm for full matrix products. 

Taking special note of the struct, e of L, we then use 
r , 


i 

’i = Z £ i, 2s-l *1, 2s 
s = 1 


i 

71 i I y 2 s - 1 y 2s 
8 = 1 


> i = 3 , 4 n (8) 


with 


i- 1 


r. = J 2 j« This is equivalent to 


"3 * y l y 2 


j = 2 , 3 [ i 


(9) 


11 2J - ^ 2 J - 1 

11 2 j -f- 1 = 71 2j - 1 + y 2 j - 1 y 2 j j*2, 3, . . . , | 111 J 

We then have, with 

i- 1 

a . - V £ . . y . 

l L t j 7 i 

j = l 


a 2 = 1 2 1 y 1 





9 
r i 

J (£ i, 2j- I + y 2) ) ( £ i, 2J + y 2J- 1 * 

J = 1 

5 j - T)j if i odd 

r i 

I (£ i, 2J- 1 + y 2j ) (£ i, 2J + y 2j- 1 > 

J = 1 

^ i ’ 11 i + £ i , i - 1 y i-l ifi even 

i = 3 , 4, . . . , n (10) 

Since r y depends only on L , it can be used in evaluating eq. (7) 
for various x. Table 2 shows the number of operations necessary 
for computing each of the various parts of the quadratic form. Note 
that the Wi nog rad algorithm is actually slower when evaluating this 
expression for Just one x , but for more than one x , the pre- 
computed values of ? may be used. Table 3 shows the ratio of 
times for the two methods for the case of T m /T g = 2.7 for 
computing m of the quadratic terms for various values of n. 

Note that a net savings occurs only in the lower right region. 

Also shown in Table 4 is the minimum ratio of T_/T for the 
Winograd procedure to be faster for computing m quadratic terms. 

Often in remote sensing calculations (e. g. maximum likelihood 
classification), many probability density functions must be evaluated 
over the same se>. of data vectors. In this case, one may pre- 
compute both the p ’s and the 7) 's . Table 5 shows the asymp- 
totic ( ^ precomputed and 7 ] computed only for the first class) 
time savings to be expected for T /T g = 2.7 for k classes 
for various values of n. Note that a net savings results for n > 4. 
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IV. Conclusions: 

The Wi nog rad matrix multiplication scheme can produce a time 
savings in various computations using large order matrices in the 
digital processing of remotely sensed data. We have shown how a 
modification of this procedure may be applied to the calculation of 
Gaussian probability density functions, and indicated how this may be 
extended to other computations. F ; or large dimensions, or a large 
number of points, there can be some time savings, but the user 
should determine the expected rime difference using the value of 
T /T g of the computer to be used. A further study should be 
undertaken to investigate the effects of decreased numerical stability 
of this algorithm in verious applications. 
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SUBROUT I N * 7 • NOMUL ( A.P ,U , I A ,R , S. I B.C , I C • T MP ) 

THIS ROUTINE USES Th c wINCGRAD PRnC r OU f » r T 3 MOLT|OLY T *0 MATRICFS 
lilt C=A*-3 miHFor a t s p • a; o, o * s ; & c , p • s. 

IA IS TH= LFNGTh OF THP COLS OF A AS DIMENSION r r> IN TH C MAIN PROG. 
IB & IC ARE SIMILAR QUANTITIES FOR 0 L C 
C MUST NOT OCCUPY THF SAMF STCMAgc AS A OR rt 
TMP IS WORKING STORAGE O c LENGTH .G^.SA? 

INTFQro r>, Q, S.U.U1 .U2 

RF AL A ( I A. Q) , H< IB. S ) . C< IC • S ) . TMP { I ) 

OOUtiL c PRECISION SS.Sl.S2 

LOGICAL ODD 

IFTA-'P 

L*Q/-2 

COMPUTE THF X I * S 

DO 10 1*1, P 

SS=-A( I .1 ) * A ( I ,21 
IF (L.LT.2) GO r t) io 
DO 15 U*2.L 
UI =2 *U 
U2*U 1-1 

15 SS=SS-A ( I ,U? )*A( I.ul ) 

! 0 T MP( I ) = SS 

COMPUTF THC pTA«S 

DO 20 JM.S 

SS = -rt( 1 . J ) * J< 2. J » 

IF <L.L T.2 | GO TO 20 
00 25 U=2.L 
U 1 =2 *U 
U2=U 1-1 

25 SS = SS— H ( U2 . J )*BC Ul .J) 

■»0 TMP( I FT A ♦ J ) = SS 
OOD = .F AL St • 

IF (J^L.NF.O) OOOc.TPUf:. 

COMPUTF THF C( I . J ) 'S 



00 

30 

I =i . p 




SI = 

TMO( T ) 




S 3 = 

A ( I . G ) 




no 

30 J = 1 . S 





SS=S 1 ATMP( IETA* J ) 





ir ( .NRT.POO ) GC TO 

J7 




SS=SS*S-**B(0 . J) 


37 



or 35 U = 1 .L 





Ul=2*U 





U ? = U 1 — 1 


35 



SS-=S ;♦ < A< i ,U2) FH(U1 , 

, J) > *< A( I ,U1 > fb (U? . J ) » 

30 

c< 

I . j ) 

= SS 



RF TURN 
CND 


ORIGINAL PAGEJS 
OF POOR QUALITY 
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